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ABSTRACT / 

This paper presents statistical information on the 
degree of consistency shown by second- and third-grade teachers in 
producing student gains in the Metropolitan Achievement Test (MAT) 
scores. Data are presented separately for each grade and for 15 Title 
1 versus 35 non-Title 1 schools. Included are correlations within 
each school year showing teacher consistency in mean residual gains 
produced across three successive school years. Although gain scores 
were computed with a simple linear model and several key factors * 
could not be controlled, the stability coefficients obtained compare 
favorably with those previously reported and suggest that teacher 
consistency may be higher than previously suspected, at least among 
experienced teachers working in their usual fashion with their normal 
/ classes, studies of such teachers who are stable over time in their 
relative effectiveness are needed to discover the teacher behaviors 
that are related to success in producing student achievement gains, 
(Author) ' 
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Abstract: This paper presents statistical information on the degree of consis- 

tency shown by second- and third-grade teachers in producing student gains on 
the Metropolitan Achievement Tests, Data are presented separately for each 
grade and for Title 1 vs. non-Title 1 schools. Included are correlations within 
each school year showing teacher consistency in producing gains across the two 
s^xes and across several subtests of the Metropolitan Achievement Tests, and 
stability coefficients showing teacher consistency in mean residual gains pro- 
duced across three successive school years. 

Although gain scores were computed with a simple linear model and several key 
factors could not be controlled, the stability coefficients obtained compare 
favorably with those previously reported and suggest that teacher consistency 
may be^ higher than previously suspected, at least among experienced teachers 
working in their usual fashion with their normal classes. Studies of such 
teachers who are stable over time in their relative effectiveness are needed to 
discover the teacher behaviors that are related to success in producing student * 
achievement gains. 

Research on teacher effectiveness has 
had remarkably little success, given 
the effort expended, in identifying 
the characteristics of effective 
teachers or in specifying teacher be- 
haviors associated with success in 
producing student achievement gains. 
Morsh and Wilder (1954) reviewed the 
literature through 1952 and concluded 
that no specific teacher behavior was 
invariably and significantly corre- 
lated with student achievement gains. 
Later reviews (Gage, 1963? Jackson 
and G^tzels, 1963) reached similar 
conclusions. 

More recent reviews (Flanders and 
Simon, 1969? Rosenshine and Furst, 
1971) are noting methodological 
advances and some consistency in find- 
ings relating teacher behavior to stu- 



*The author wishes to acknowledge and 
thank Thomas Good and Donald Veldman 
for their suggestions regarding the 
overall i conceptualization of the re- 
search; Edmund Emmer, Earl Jennings, 
and Donald Veldman for their statis- 
tical design suggestions? Edmund Emmer, 
Thomas Good, and Barak Rosenshine for 
their comments on an earlier version 
of the paper? Carolyn Evertson, Kathey 
Paredes, Kathy Senior, and Jon Shef- 
field for their help in data prepara- 
tion and analysis; and Susan Florence 
and Karen Mays for their help in manu- 
script preparation. 

This paper is an expanded version of a 
paper entitled "Stability in Teacher 
Effectiveness" delivered by the authoi 
at the annual meeting of the American 
Educational Research Association, 1972. 
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dent achievement* Even the teaching 
variables they cite, however, do not 
show strong relationships with mea- 
sures of student achievement gains. 
Their optimism is based upon agreement 
across several studies showing signi- 
ficant but weak relationships, rather 
than strong and clear-cut relation- 
ships. Thus, the search for effec- 
tive teachers and the attempt to iden- 
tify effective teaching behavior have 
not produced much. Those positive 
findings which have appeared are 
relatively weak ones. 

Partly for this reason many observers 
now hold that teaching is an extreme- 
ly complex art and that what is effec- 
tive teaching will vary with the stu- 
dent and the situation, so that any 
search for generally effective teach- 
ers or effective teaching behavior is 
doomed to failure from the start. 

This view is especially common -among 
those who reject student achievement 
gain as . an important measure of teach- 
er effectiveness, although many who do 
stress this criterion also, hold this 
view because of researchers' contin- 
ued failure to demonstrate clear 
relationships between teaching be- 
havior measures and student gain mea- 
sures • 

This research assumes that student 
achievement gain is an important 
criterion of teaching effectiveness. 

It is not seen as the only criterion, 
or even necessarily as the prepotent 
one, but it is assumed to be an impor- 
tant criterion for judging teacher 
effectiveness. Furthermore# it is 
especially useful for research pur- 
poses because# in contrast to criteria 
such as promoting social development 
and improving motivation and self con- 
cept, it is more easily and reliably 
measured. 



Consequently, student achievement gain 
was selected as the criterion for 
teacher effectiveness for this re- 
search. When the terms "teacher effec- 
tiveness" or "effective teaching" are 
used, they refer solely to effective- 
ness in producing gains in measured 
achievement. 

As previously noted, even studies 
which have used student achievement 
gain as the criterion of teacher ef- 
fectiveness have not produced clear 
results. One reason is that few stud- 
ies have included both measures of 
teacher behavior and measures of sub- 
sequent student gain in the same re- 
search. Instead# many have used high- 
inference ratings or other non- 
behavioral teaching measures, or have 
used something other than measured 
student gain as the effectiveness cri- 
terion. Mitzel and Gross (1958) re- 
viewed several studies that separated 
teachers Qn their relative effective- 
ness in producing student gain and 
then sought to identify teaching be- “ 
havior which was associated with suc- 
cess in producing gains . They con- 
cluded that effective and ineffective 

teachers could not be identified un- 
equivocally and that no particular 
teaching behavior was consistently 
related to effectiveness^ 

Part of the reason for this was that 
many studies used student teachers, 
first-year teachers, or teachers in- 
volved in a special experimental study. 
Furthermore, such studies were usually 
confined to one school year at the 
most. When teachers are inexperi- 
enced or when they are involved in a 
new pind special experimental program, 
their classroom behavior is likely to 
be unstable. Furthermore, without 
replication or repeated measurements 
of effectiveness across several sam- 
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plings , it cannot be known whether 
such teachers will be stable in their 
relative effectiveness in producing 
student gains. 

The seriou.s nature of this problem* was 
noted in a recent review by Rosens hine 
(1970) , who could locate only five 
long-term studies which included sta- 
bility coefficients reflecting consis- 
tency across time on measures of 
teacher effectiveness. These five 
studies are very different from and 
difficult to compare* with one another; 
but i with one exception, the stability 
coefficients in these studies were 
generally low, often near zero. 

As far as they go, the available data 
may seem to support the skeptical view 
that teaching effectiveness is a com- 
plex, elusive art ill-suited for 
scientific study. However, the avail- 
able data are really not appropriate 
as a basis for drawing such conclu- 
sions about typical teachers, since 
these data come mostly from atypical 
teaching situations. 

How stable is the effectiveness of 
typical teachers? Are some teachers 
more stable than others? The present 
study was an attempt to answer such 
questions with more appropriate data 
than have been used in the past. In 
particular, the problem of stability 
in teacher effectiveness was addressed 
by studying ordinary teachers who were 
working with their regular classes in 
their usual fashion (no experimental 
intervention was involved) , and by ex- 
tending the scope of the research over 
a time period long enough to allow us 
to reasonably judge stability (three 
full school years) . 



/ 

I 



The raw data for this study, obtained 
with the cooperation of a city school 
system, consisted of individual stu- 
dents 1 scores on the Metropolitan 
Achievement Tests which are adminis- 
tered each fall. Although records for 
all grades were available, resource 
limitations required a selection of 
only a part of the data. On the as- 
sumption that student achievement gain 
is generally more acceptable as an 
important criterion of teacher effec- 
tiveness the earlier grades, the 
study was focused on the early ele- 
mentary grades rather than the inter- 
mediate or secondary grades. The 
first grade was dropped from considera- 
tion, however, because satisfactory 
pre-scores were not available at this 
grade. The children do take the 
Metropolitan Readiness Tests, but 
these are known to be heavily in- 
fluenced by differences in home envi- 
ronments , and they do not have a 
direct continuity with the later Metro- 
politan Achievement Tests. For these 
and other reasons, it seemed prudent 
to avoid the first grade rather than 
use readiness scores as pre-scores. 

Therefore, the second and third grades 
were selected for study. The’ Metro- 
politan Achievement Test scores from 
the fall of the second grade were used 
as pre-scores for the second grade, 
and the scores from the tests given in 
the fall of the third grade were used 
as post-scores. Similarly, the tests 
given in the fall of the third grade 
were vised as pre-scores for the third 
grade and the tests given in the fall 
of the fourth grade were used as post- 
scores. These data were compiled for 
the school years beginning in 1967, 
1968, and 1969. 






TO REPLACE PAGE 4 AND PAGE 5 (UP TO THE ''RESULTS SECTION") OF REPORT NO. 77, 
"STABILITY IN TEACHER EFFECTIVENESS," RESEARCH AND DEVELOPMENT CENTER FOR. 
TEACHER EDUCATION, THE UNIVERSITY OF TEXAS AT AUSTIN. > 

The study Included a I I teachers in these two grades who were teaching 
at the same grade level for all three years. All available data for children 
In their classes during these three years were recorded. Grade level equiva- 
lents rather than raw scores were used, since these are cruder and more 
normalized measures likely to contain less error variance than the raw scores. 
Data were available from 15 Title 1 schools and 35 non-Title I schools. These 
were two separate sets of data, since different tests were used In the two 
types of schools. 

The Title I schools used the Primary I battery of the Metropolitan 
Achievement Tests (copyright 1958) In grade 2, the Primary II battery (copy- 
right 1958) In grade 3, and the elementary battery (copyright 1959) in grade 
4. This posed no problem for the word knowledge, word discrimination, and 
reading subtests, since these appear in all +>.ree batteries. However, the 
Primary I battery contains only a single arithmetic subtest (arithmetic con- 
cepts and skills), while the Primary II and the elementary batteries contain 
two (arithmetic computation and arithmetic concepts and problem solving). 

The school records, however, contained only I nformat Ion' on the composite 
total arithmetic score for the Primary II battery. 

Thus the second grade teachers In Title I schools have only one set 
of data for arithmetic, based upon the arithmetic concepts and skills subtest 
of the Primary I battery (pretest) and 4he composite total of the Primary II 
battery (posttest). In this case the pretest was nrlmarlly arithmetic compu- 
tation, while the posttest composite combined computation and reasoning. 

Third grade teachers in Title I schools have two sets of gain data 
for arithmetic, but both sets of post-scores were adjusted using the same set 
of prescores as covariates. 8ince only the composite total arithmetic score 
was available from the Primary II battery as an arithmetic pretest. It was 



used as a covarlate in computing residual gain scores for both the arithmetic 
computation subtest and the arithmetic concepts and problem solving subtest 
of the elementary battery given In fourth grade. A simi lar procedure was 
used for the grade 2 teachers in non-Title I schools, since the Primary II 
battery was used in grade 2 and the elementary battery i n grade 3. In these 
cases, a composite total score was used as a prescore covariate for two more 
specific postscores. Although these covariate controls were less satisfactory 
than controls using the same subtest would have been, they were used neverthe- 
less, because a partially satisfactory covariate was preferable to the use of 
raw ga i n scores. 

Computation of residual gains for non-Title I grade 3 classes was 
simpler, since the elementary battery used in grades 3 and 4 contains the same 
five subtests (word knowledge, word discrimination, read! ng, ,ari thmetic compu- 
tation, and arithmetic concepts and problem solving). Thus both pre- and post- 
scores were available on the same subtests for these teachers’ classes. 

Residual gain scores were first computed for each student within sex, 
since girls generally outperform boys at these ages, and within each of the 
three years to guard against any systematic yearly difference. Thus, for example 
the five residual gain scores for a second grade boy in a non-Title I school 
in 1969 were based on the five respective distributions of pre- and postscores 
for a I ! boys in second grade in 1969 in the classes of teachers in non-Title I 
schools i ho were inc I uded in the sample. Using the live respective prescores 
as covariates, residual gain scores on each of these give subtests werG com- 
puted for each student, using a linear model where g ! y - (a + bx). 

Data for teachers were then compiled by computing mean residual gain 
scores for each of their three respective classes. Within each subtest, a 
mean was computed for each sex and for the class as a whole, for each teacher 
for each of the three years under study. These analyses resulted in four sets 
of data : gain scores for second (N = 34) and thl rd grade (N = 26) teachers in 
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third grade (N = 51) 
of these four data 

sets (mean residual gains) were then used to Investigate consistency across 
subtests within year and stabi llty across the two sexes and the three years 
within subtest. 



Title I schools, and gain scores for second <N = 54) and 
teachers In non-Title I schools. Correlational analyses 










RESULTS 



CONSISTENCY ACROSS SUBTESTS 

Cpr relations among the mean residual 
gain scores within each of the three 
years are shown in Table 1. Some 
teachers were excluded from these 

* analyses because of seriously incom- 
plete data. These were teachers work- 

* ing schools haying a high rate of 
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TABLE 1. CORRELATIONS OF MEAN ADJUSTED GAIN SCORES ON THE SUBTESTS OF THE 
METROPOLITAN ACHIEVEMENT TEST WITHIN YEARS 





Grade 


2, Title 1 Schools 


(Possible N = 






1967-1968 










(N = 30) 










WD R 


A 


WD 




WK 


.71*** ,90*** 


73*** 


.64** 




WD , 


.71*** 


56*** 






R 


• 


67*** 






Grade 


2, non-Title 1 Schools (Possible 






1967-1968 




.5 






(N **■ 49) 






1 




WD R AC 


AR 


WD 


{ 

i! 

< 

j 


WK 


.58*** .62*** .26* 


.41** 


.74*** 


i 


WD 


.55*** .03 


, .31* 






R 


.17 ' 


.30* 






AC 




.68*** 




i 


Grade 


3, Title 1 Schools (Possible N = 


i 

y \ 




1967-1968 






I 


, 


(N = 26) 


. * 




] 




WD R AC 


AR 


WD 


? 

1 

hi 

t 

i 

\ \ 


WK 


.88*** .92*** .03 


..57** 


.62** 


. WD 


.87*** .24 


• .73*** 




1 \ 


R 


.13 


.67*** 




i A 

" o \ ' 


AC 




.53** 




ERJC\ 

■ ! _\ 











\ 



1968-1969 




1969-1970 




(N = 27) 




(N = 27) 




R 


A 


WD R 


A 


* .80*** . 


58** 


.69*** .55** 


.33* 


.55** . 


60*** 


.67*** 


.15 


• 


42* 


/ 


,33* 


N = 54) 1 






- 


1968-1969 




1969-1970 




(N = 43) 




(N = 36) 




R AC 


AR 


WD R AC 


AR 


.69*** .28* 


' .39** 


.67*** .57*** .41** 


.51** 


.72*** .11 


.31* 


.71*** .46** 


.52** 


.31* 


.51*** 


.47** 


.51** 


- 


.64*** 




.80*** 


26) 1 




f 




1968-1969 


. 


1969-1970 


\ 


(N = 24) 


- 


(N = 22) 




R AC 


AR 


WD R ■ AC 


AR ■ 


.89***'. 52** 


.59** 


.55** .59** .30 


.31 


.61** , .29 


.37* 


.53** .35 


.41 


.58** 


.66*** 


.50** 


.57** 




.72*** 




.83*** 




l 

- \ • 


• 
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TABLE 1. (Continued) 
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Grade 3 , non-Title 1 Schools (Possible N = 51) 





1967-1968 


1968- 


1969 




1969 


-1970 


' 




(N = 46) 


(N = 


1 42) 




(N 


= 45) 






WD R AC AR 


WD R 


AC 


° AR 


WD R 


AC 


AR 


WK 


% 

.70*** .65*** .65*** .70*** 


.62*** .43** 


.48** 


. 4C** 


.63*** .45** 


.36** 


.56*: 


WD 


.54*** .60*** .67*** 


.38** 


.41** 


.26* 


.37** 


.44** 


.45* 


R 


.63*** .63*** 




.30* 


.39** 




.26* 


.28* 


AC 


,, 71*** 






.71*** 






.70* 



\ 

^Classes for a given year were excluded if data were not available for at least 14 student 
* - p < .05 
** - P < .01 

*** - p < .001 / . 
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pupil turnover so that only a minor- 
ity of students present for testing 
one year were back again the following 
^year. Since the remaining students in 
such classes are likely to be a non- 
random sample, a cutoff point of 14 
was established (arbitrarily). When 
data on fewer than 14 students from a 
given class were available, the class 
was excluded from the analyses. 

\ 

Inspection of Table 1 shows that most 
correlations across subtests were 
moderate to high (.40 - .80). As 
wotild be expected, most of the low 
correlations that do appear are be- 
tween language arts subtests and arith- 
metic subtests. Thus teachers are 
more consistent in producing gains' 
within these two general curriculum r 
areas than across them; success in 
producing language arts gains does not 
always imply success in producing 
arithmetic gains. 

Within the subset of three language 
arts subtests the reading test was 
more pivotal, usually correlating high- 
er with word knowledge and word dis- 
crimination than the latter tests cor- 
related with each other. Also ( as 
might have been expected, the language 
arts test regularly correlated higher 
with arithmetic concepts and problem 
solving (which involved reading arith- 
metic problems) than with arithmetic 
computation, a more purely mathemati- 
cal test. 



CONSISTENCY ACROSS SEX OF STUDENT 

Correlations of the teachers* mean* 
residual gains for boys with their 
mean residual gains for girls were 
computed within each subtest for each 
year, using an arbitrary cutoff of 
seven or more students in each sex 
group as the basis for including a 
given class in the analyses. These 



correlations were all very high, ap- 
proaching 1.00. Thus individual (fe- 
male) teachers do not tend to be dif- 
ferentially effective with boys vs. 
girls, although, as previously noted, 
girls generally outperform boys in 
American schools at these grades. 

This conclusion was confirmed by an in 
formal analysis of each teacher's mean 
residual gains for boys and for girls, 
within each subtest and year. Of 68 
teachers with a full data set, only 
four showed sizable and consistent sex 
differences in residual gain means. 

Two of these did better with boys and 
two with girls (see Appendix A.). 

These data are consistent with data 
from several sources showing that, 
despite frequent claims to the con- 
trary , female teachers are not typi- 
cally biased towards girls and against 
boys (Brophy and Good, 1973) , 

Correlations across the three years 
for mean residual gains with each of 
the subtests are presented in Table 2, 
Again, an arbitrary cutcff of 14 stu- 
dents was used in excluding certain 
classes from the analyses. 

Although there are a few exceptions, 
correlations between contiguous years 
tend to be ,.25 or higher. They gener- 
ally compart: favorably with the fig- 
ures obtained in the five long-term 
studies reviewed by Rosenshine (1970) , 
at least for three of the four samples 

The reasons why the stability coeffi- 
cients for the second-graae teachers 
in Title 1 schools are lower than 
those for the other three groups are 
not known, although they may be due to 
a combination of the age and degree of 
cognitive development of the children. 
That is, children from economically > 
disadvantaged families develop in an 
environment that is less conducive to 
the stimulation of full development of 
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TABLE 2. STABILITY COEFFICIENTS ACROSS THREE YEARS FOR MEAN ADJUSTED GAIN SCORES 
ON THE SUBTESTS OF THE METROPOLITAN ACHIEVEMENT TESTS 

GRADE 2 



Title 1 Schools (Possible N = 34) _ 
Years 1-2 Years 2-3 Years 1-3 



Non-Title 1 Schools (Possible W = 54) 



Years 1-2 



Years 2-3 „ Years 1-3 



(N = 26) 


(N = 22) 


' (N = 24) 


e- 


CM 

H 


(N = 36) 


(N = 37) 


WK .49** . 


.18 


.10 


WK 


.41** 


Ap* 


•.-45** 


WD .26 


.28 


.30 


WD 


.63*** 


.42** V " 


.50** 




*. \ 






• 






R . 31 


.00 


-.05 


R 


.40** 


.42** 


.43** 


A .24 


-. 12 


-.03 


AC 


.34* 


.45** 


.06 








AR 


.35* 


.33* 


.42** 






GRADE 


3 








Title 1 Schools (Poss 


ible N — 26) 


Non* 


-Title 1 Schools 


(Possible 


N = 51) 1 


t 














Years 1-2 


Years 2-3 


Years 1-3 




Years 1-# 


Years 2-3 


Years -l- 3 


(N = 24) 


(N = 20) • 


(N = 22) 




(N = 44) 


(N = 42) 


(N = 41) 


WK .52** 


.78*** 


.65*** 


WK 


.39** 


.45** 


.40** 


WD .28 


• 19 


.02 


WD 


.30* 


.26* 


.26* 


R .$4** 


.39* 


.08 


R 


.26* 


-.07 


.10 . 


AC .39* 


.51* 


.55** 


AC 


.61*** 


.65*** 


.44** 


AR .28 


.32 


-.19 


AR 


.41** 


.46** 


.32* 



■\ 



O 

,ERLC 



^■Classes for a given year were excluded if data were not available for at least 

14 students. 

• * - P < .05 



I 



** - p < .01 
p < .001 
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their cognitive potential than chil- 
dren from more advantaged families 
(Hess, 1970). Thus their learning 
potential is more variable when they 
enter school, and it may take an extra 
year before they begin to perform con- 
sistently, establishing * a relatively 
stable level of achievement with re- 
spect to their classmates. 

This is merely one possibility, how- 
ever; other explanations , such as sam- 
pling error or some unknown difference 
between second-grade teachers in Title 
1 schools and teachers in the other 
three groups, are also possible. 

The generally higher coefficients in 
Table 1 as compared to Table 2 show 
that a yearly "student cohort” or 
"class” effect exists, despite the 
controls involved in using residual 
rather than raw gains and despite the 
decision to compute gains within each 
of ‘"'the three years separately.' While 
it is possible that the explanation 
for this resides primarily with the 
teacher (a given teacher vacillates in 
effectiveness across years, other 
things being equal) , it seems much 
more probable that class variables 
such as general motivation and class- 
room atmosphere exert some effect on 
the achievement of the class as a 
whole in a given year. Also, ^ factors 
such as illness and personal problems 
in both teachers and students are more 
likely to exert constant effects on 
achievement within a single year than 
across two or ijiore years. 

IDENTIFYING CONSISTENT 
TEACHERS 

'The data presented above represent the 
first step in a planned series of in- 
vestigations on teaching effectiveness. 
The correlational analyses are group 



data. While they are useful for show- 
ing that considerable stability is 
evident in the general sample, they 
mask individual differences among 
teachers,, The next step in the re- 
search involved selection of consis- 
tent teachers for further study. 

Although they can be operationally de- 
fined, terms like "stable” and "con- 
sistent" are relative, subjectively 
defined terms 1 . One observer, using 
loose criteria, might label a parti- 
cular teacher as consistent, while 
another, using tighter criteria, might 
see the same teacher as inconsistent. 
This section presents the author’s 
impressions of the degree of consis- 
tency in the samples, based on his own 
subjective criteria. Appendices A and 
B are provided for those readers who 
wish to approach the data with their 
own criteria. 

There are at least three kinds of con- 
sistency that can be explored in the 
data of Appendix B. The first, repre- 
sented by the standard deviations, in- 
volves the variability in gains within 
each class. Small standard deviations 
mean that the teacher tends to produce 
equivalent gains in all her students, 
while larger ones suggest relatively 
large gains by some students and smaL! 
ones by others. This aspect of con- 
sistency, though interesting, was not 
used as a criterion for selecting 
teachers for two reasons. 

First, the criterion appears to be 
overly restrictive. If a class were 
homogeneously grouped, deliberately or 
randomly, variability would be reduced 
The school system did not practice 
homogeneous grouping officially, but 
it probably occurred unofficially at 
some schools. Also, even with statis- 
tical control via the residualizing 
process, individual differences among 



students are bound to have some effect. 
"Late bloomers" and children whose pre- 
scores were artificially depressed 
(due to illness, emotional problems, 
or other factors which hampered learn- 
ing the previous year) will show great- 
er than expected gains, on the average, 
in their residual scores. Similarly, 
students with prolonged illness or 
emotional problems in a given year 
will gain less than expected, despite 
the teacher's efforts. Since indivi- 
dual differences of this sort were not 
controlled, the standard deviations ' 
are less useful as criteria of consis- 
tency in teachers. 

The second reason is practical: prob- 

ably for the reasons just described 
the standard deviations show great 
variability. A few teachers regularly 
show very high standard deviations, 
but there is no corresponding group 
showing consistently low standard 
deviations. Thus there is no subgroup 
of teachers who are consistent by this 
criterion. 

The other two criteria of consistency 
are based on the teachers' mean resi- 
dual gain scores for each of the three 
years. The first criterion is linear 
constancy across the three means for 
a given subtest. When this appears, 
the teacher shows approximately* the 
same me an residual gain for each of 
the three years (for example, .27, • 
.31, and .29). This is the most 
widely accepted, common sense criteri- 
on of teacher consistency. ’ 

A somewhat different kind of consis- 
tency is seen in the*data for teachers 
who show a linear trend across the 
three years. Such teachers show a 
pattern of either improvement (.03, 

.24, .42) or decline (.38, .16, -.08) 
in effectiveness. While this is not 



stability in the sense of linear con- 
stancy, it does represent a form of 
consistency. 

The author's judgments concerning con- 
sistency are indicated in the symbols 
included in Appendix B. v Whenever data 
for a given subtest were available for 
all three years, one of four consis- 
tency symbols was assigned: 

1. A horizontal line ( — ) indicates 
linear constancy? 

2. A rising line (/ ) indicates 
linear improvement? 

3. A falling line (\ ) indicates 

linear decline? ** 

4. An angular line (A) indicates a 
non-linear trend. 

Overall, 28% of these symbols indicate 
linear constancy; 13%, linear improve- 
ment; 11%', linear decline? and 49%, 
non-linearity. Thus about half of the 
assigned symbols indicate some form of 
general consistency. 

The 49% showing non-linearity are not 
necessarily inconsistent, however. 
Certain teachers show a sharp change 
after the first year (.03, .38, '.40) 
or after the second year (.05, .02, 
.36). While many fluctuations are 
merely error variance, some of the 
changes that appear to be non-linear 
or random in the data from three years 
would form a systematic pattern if # 
later data were available. Bor exam- 
ple, compare the three-year pattern 
shown above (.03, .38, .40) with a 
hypothetical five-year pattern for 
the same teacher (.03, .38, .40, .44, 
,42). The latter shows the teacher 
attaining a linearly constant, high 
level of effectiveness after some 
initial mediocrity. 
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Given the probable masking of some 
/ consistency by the limitations of the 
I data, and given the many uncontrolled 
/ factors operating to affect a teach- 
' er*s effectiveness with her class dur- 
ing a given year, it seems likely that 
the present data, to the extent that 
they may be in error, are underesti- 
mating the consistency of teacher ef- 
fectiveness. 



in math, or vice versa. Observational 
studies of these consistent teachers, 
done in the naturalistic setting as 
they carry out their normal activities 
with their own students, should yield 
greater payoff than the kinds of teach 
er effectiveness research done in the 
past. 



In addition to the probable sources of 
error already mentioned, it should be 
noted that the residual gain scores 
used were computed with simple linear 
regression models which tend to under- 
estimate the expected gain for high 
achievers and overestimate it for low 
achievers. As a result, a teacher 
with a low ability group is penalized 
and a teacher with a high ability 
group is over- credited when mean res±*- 
dual gains are computed. Although the 
school district did not officially 
practice ability grouping, it is like- 
ly that certain schools did so unof- 
ficially, and student aptitude was not 
vdirectly measured or controlled in 
this research. This would produce in- 
stability in mean gain scores whenever 
a teacher who had a high group one 
year had an average or low group the 
next, or vice versa. 

The findings indicate that, at least 
i:i grades two and three, teachers who 
are consistent in their relative ef- 
fectiveness can be identified. As 
noted, the vast majority of these con- 
sistent teachers are about equally 
successful with boys as with girls. 
Some are also about equally successful 
across three years in producing gains 
on the four or five Metropolitan sub- 
tests on which data are available. 
Others show a more complex kind of 
consistency, such as producing high 
gains in language arts and low gains 
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